Indirect Spatial Data Extraction from Web Documents

نویسندگان

  • Dimitar Blagoev
  • George Totkov
  • Milena Staneva
  • Krassimira Ivanova
  • Krassimir Markov
  • Peter Stanchev
چکیده

An approach for indirect spatial data extraction by learning restricted finite state automata from web documents created using Bulgarian language are outlined in the paper. It uses heuristics to generalize initial finite-state automata that recognizes only the positive examples and nothing else into automata that recognizes as larger language as possible without extracting any non-positive examples from the training data set. The learning method, program realization and experiments are presented. The investigation is carried out in accordance and following the rules of EU INSPIRE Network.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SXPath - Extending XPath towards Spatial Querying on Web Documents

Querying data from presentation formats like HTML, for purposes such as information extraction, requires the consideration of tree structures as well as the consideration of spatial relationships between laid out elements. The underlying rationale is that frequently the rendering of tree structures is very involved and undergoing more frequent updates than the resulting layout structure. Theref...

متن کامل

Entity ranking using click-log information

Log information describing the items the users have selected from the set of answers a query engine returns to their queries constitute an excellent form of indirect user feedback that has been extensively used in the web to improve the effectiveness of search engines. In this work we study how the logs can be exploited to improve the ranking of the results returned by an entity search engine. ...

متن کامل

Knowledge Extraction from Web Documents Using Self- Organizing Neural Networks

Knowledge discovery is defined as non-trivial extraction of implicit, previously unknown and potentially useful information from given data [1]. Knowledge extraction from web documents deals with unstructured, free-format documents whosenumberisenormousandrapidlygrowing.

متن کامل

Landmark Extraction: A Web Mining Approach

Landmarks play crucial roles in human geographic knowledge. There has been much work focusing on the extraction of landmarks from geographic information systems (GIS) or 3D city models. The extraction of landmarks from digital documents, however, has not been fully explored. The World Wide Web provides a rich source of region related information based on our understanding of geographic space. W...

متن کامل

OLERA: On-Line Extraction Rule Analysis for Semi-structured Documents

The vast amount of online information available has led to renewed interest in information extraction (IE) systems that analyze input documents to produce a structured representation of selected information from the documents. Information extraction from semistructured documents has been studied extensively recently. Most researches focus on supervised learning approaches where targets must be ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009